The purpose of this analysis document is to ensure the reproducability of the results by guiding the reader through the random forest analysis of the factors associated with the health of western redcedar.
Root data were shared by citizen scientists in the Wester Redcedar Dieback Map project on iNaturalist.
All of the data used in the below analyses are described in the Data Wrangle folder.
The data used in the below visualizations are described in the Data Wrangle folder.
All tree health categories
## # A tibble: 11 x 2
## # Groups: field.tree.canopy.symptoms [11]
## field.tree.canopy.symptoms n
## <fct> <int>
## 1 Branch Dieback or 'Flagging' 19
## 2 Browning Canopy 19
## 3 Extra Cone Crop 2
## 4 Healthy 403
## 5 Multiple Symptoms (please list in Notes) 17
## 6 New Dead Top (red or brown needles still attached) 33
## 7 Old Dead Top (needles already gone) 83
## 8 Other (please describe in Notes) 8
## 9 Thinning Canopy 118
## 10 Tree is dead 37
## 11 Yellowing Canopy 10
We need to filter the data to only include response and explanatory variables we’re interested in. For example, whether a sound clip was included in the iNat data is not important.
We also need to remove other response variables like “field.percent.canopy.affected….” so it is not used as a predictor for tree health.
Note it might be interesting to know if the user was an important factor in predicting if the tree is healthy/unhealthy.
There are also a number of factors that should probably be removed because they may be biasing the data. For example, only trees with the ‘other factor’ question may only be answered for unhealthy trees. We need to think about this a bit more.
We continue to get the below error, but were able to work around it by imputing the data.
Error in randomForest.default(m, y, …) : Need at least two classes to do classification.
To impute the data we have to remove factors with >53 levels.
The below code lists the number of levels for the variables that are factors.
Imputed data table
## ntree OOB 1 2 3 4 5 6 7 8 9 10 11
## 300: 45.53% 94.74% 94.74%100.00% 15.63% 76.47% 87.88% 72.29%100.00% 72.03% 94.59%100.00%
## ntree OOB 1 2 3 4 5 6 7 8 9 10 11
## 300: 46.06% 94.74% 94.74%100.00% 15.88% 88.24% 87.88% 73.49%100.00% 72.88% 91.89%100.00%
## ntree OOB 1 2 3 4 5 6 7 8 9 10 11
## 300: 44.73% 94.74% 94.74%100.00% 15.38% 76.47% 87.88% 72.29%100.00% 68.64% 91.89%100.00%
## ntree OOB 1 2 3 4 5 6 7 8 9 10 11
## 300: 46.06%100.00% 94.74%100.00% 15.88% 82.35% 87.88% 74.70%100.00% 72.88% 89.19%100.00%
## ntree OOB 1 2 3 4 5 6 7 8 9 10 11
## 300: 45.79% 94.74%100.00%100.00% 15.38% 82.35% 87.88% 72.29%100.00% 73.73% 91.89%100.00%
## ntree OOB 1 2 3 4 5 6 7 8 9 10 11
## 300: 45.53%100.00% 94.74%100.00% 14.14% 82.35% 87.88% 75.90%100.00% 73.73% 91.89%100.00%
##
## Call:
## randomForest(formula = field.tree.canopy.symptoms ~ ., data = training, ntree = 2001, importance = TRUE, proximity = TRUE, na.action = na.omit)
## Type of random forest: classification
## Number of trees: 2001
## No. of variables tried at each split: 23
##
## OOB estimate of error rate: 47.24%
## Confusion matrix:
## Branch Dieback or 'Flagging'
## Branch Dieback or 'Flagging' 1
## Browning Canopy 0
## Extra Cone Crop 0
## Healthy 2
## Multiple Symptoms (please list in Notes) 0
## New Dead Top (red or brown needles still attached) 1
## Old Dead Top (needles already gone) 1
## Other (please describe in Notes) 0
## Thinning Canopy 3
## Tree is dead 0
## Yellowing Canopy 0
## Browning Canopy
## Branch Dieback or 'Flagging' 0
## Browning Canopy 0
## Extra Cone Crop 0
## Healthy 2
## Multiple Symptoms (please list in Notes) 0
## New Dead Top (red or brown needles still attached) 1
## Old Dead Top (needles already gone) 1
## Other (please describe in Notes) 0
## Thinning Canopy 0
## Tree is dead 2
## Yellowing Canopy 0
## Extra Cone Crop Healthy
## Branch Dieback or 'Flagging' 0 10
## Browning Canopy 0 9
## Extra Cone Crop 0 0
## Healthy 0 254
## Multiple Symptoms (please list in Notes) 0 6
## New Dead Top (red or brown needles still attached) 0 16
## Old Dead Top (needles already gone) 0 17
## Other (please describe in Notes) 0 4
## Thinning Canopy 0 51
## Tree is dead 0 15
## Yellowing Canopy 0 6
## Multiple Symptoms (please list in Notes)
## Branch Dieback or 'Flagging' 0
## Browning Canopy 0
## Extra Cone Crop 0
## Healthy 0
## Multiple Symptoms (please list in Notes) 3
## New Dead Top (red or brown needles still attached) 0
## Old Dead Top (needles already gone) 1
## Other (please describe in Notes) 0
## Thinning Canopy 0
## Tree is dead 0
## Yellowing Canopy 0
## New Dead Top (red or brown needles still attached)
## Branch Dieback or 'Flagging' 0
## Browning Canopy 1
## Extra Cone Crop 0
## Healthy 7
## Multiple Symptoms (please list in Notes) 0
## New Dead Top (red or brown needles still attached) 0
## Old Dead Top (needles already gone) 4
## Other (please describe in Notes) 0
## Thinning Canopy 1
## Tree is dead 2
## Yellowing Canopy 1
## Old Dead Top (needles already gone)
## Branch Dieback or 'Flagging' 2
## Browning Canopy 1
## Extra Cone Crop 0
## Healthy 9
## Multiple Symptoms (please list in Notes) 1
## New Dead Top (red or brown needles still attached) 4
## Old Dead Top (needles already gone) 22
## Other (please describe in Notes) 0
## Thinning Canopy 15
## Tree is dead 4
## Yellowing Canopy 0
## Other (please describe in Notes)
## Branch Dieback or 'Flagging' 0
## Browning Canopy 0
## Extra Cone Crop 0
## Healthy 1
## Multiple Symptoms (please list in Notes) 0
## New Dead Top (red or brown needles still attached) 0
## Old Dead Top (needles already gone) 0
## Other (please describe in Notes) 0
## Thinning Canopy 0
## Tree is dead 0
## Yellowing Canopy 0
## Thinning Canopy Tree is dead
## Branch Dieback or 'Flagging' 3 0
## Browning Canopy 0 2
## Extra Cone Crop 1 0
## Healthy 18 6
## Multiple Symptoms (please list in Notes) 1 0
## New Dead Top (red or brown needles still attached) 2 1
## Old Dead Top (needles already gone) 12 4
## Other (please describe in Notes) 0 0
## Thinning Canopy 15 4
## Tree is dead 8 1
## Yellowing Canopy 1 0
## Yellowing Canopy class.error
## Branch Dieback or 'Flagging' 0 0.9375000
## Browning Canopy 0 1.0000000
## Extra Cone Crop 0 1.0000000
## Healthy 1 0.1533333
## Multiple Symptoms (please list in Notes) 0 0.7272727
## New Dead Top (red or brown needles still attached) 0 1.0000000
## Old Dead Top (needles already gone) 0 0.6451613
## Other (please describe in Notes) 0 1.0000000
## Thinning Canopy 0 0.8314607
## Tree is dead 0 0.9687500
## Yellowing Canopy 0 1.0000000
Selected tree health categories
## # A tibble: 5 x 2
## # Groups: field.tree.canopy.symptoms [5]
## field.tree.canopy.symptoms n
## <fct> <int>
## 1 Healthy 403
## 2 New Dead Top (red or brown needles still attached) 33
## 3 Old Dead Top (needles already gone) 83
## 4 Thinning Canopy 118
## 5 Tree is dead 37
##
## Call:
## randomForest(formula = field.tree.canopy.symptoms ~ ., data = training, ntree = 2001, importance = TRUE, proximity = TRUE, na.action = na.omit)
## Type of random forest: classification
## Number of trees: 2001
## No. of variables tried at each split: 23
##
## OOB estimate of error rate: 38.02%
## Confusion matrix:
## Healthy
## Healthy 274
## New Dead Top (red or brown needles still attached) 16
## Old Dead Top (needles already gone) 25
## Thinning Canopy 45
## Tree is dead 19
## New Dead Top (red or brown needles still attached)
## Healthy 4
## New Dead Top (red or brown needles still attached) 0
## Old Dead Top (needles already gone) 0
## Thinning Canopy 1
## Tree is dead 2
## Old Dead Top (needles already gone)
## Healthy 10
## New Dead Top (red or brown needles still attached) 2
## Old Dead Top (needles already gone) 15
## Thinning Canopy 16
## Tree is dead 3
## Thinning Canopy Tree is dead
## Healthy 17 7
## New Dead Top (red or brown needles still attached) 3 1
## Old Dead Top (needles already gone) 15 1
## Thinning Canopy 22 2
## Tree is dead 3 2
## class.error
## Healthy 0.1217949
## New Dead Top (red or brown needles still attached) 1.0000000
## Old Dead Top (needles already gone) 0.7321429
## Thinning Canopy 0.7441860
## Tree is dead 0.9310345
Binary tree health categories
## # A tibble: 2 x 2
## # Groups: field.tree.canopy.symptoms [2]
## field.tree.canopy.symptoms n
## <fct> <int>
## 1 Healthy 403
## 2 Unhealthy 346
##
## Call:
## randomForest(formula = field.tree.canopy.symptoms ~ ., data = training, ntree = 2001, importance = TRUE, proximity = TRUE, na.action = na.omit)
## Type of random forest: classification
## Number of trees: 2001
## No. of variables tried at each split: 23
##
## OOB estimate of error rate: 29.06%
## Confusion matrix:
## Healthy Unhealthy class.error
## Healthy 231 76 0.2475570
## Unhealthy 87 167 0.3425197
## left daughter right daughter split var
## 1 2 3 component_taxgrtgroup
## 2 4 5 norm_Tmin01
## 3 6 7 Profilecurv.x
## 4 8 9 TD
## 5 10 11 CMI11
## 6 12 13 component_nirrcapscl
## 7 0 0 <NA>
## 8 14 15 component_irrcapcl
## 9 16 17 component_elev_r
## 10 18 19 EMT
## 11 20 21 norm_SHM
## 12 0 0 <NA>
## 13 22 23 Eref10
## 14 24 25 component_ffd_l
## 15 26 27 component_elev_r
## 16 28 29 norm_EMT
## 17 30 31 Slope.x
## 18 32 33 norm_CMD09
## 19 0 0 <NA>
## 20 34 35 norm_Tmin_sm
## 21 36 37 component_taxreaction
## 22 38 39 norm_bFFP
## 23 40 41 DD18_sm
## 24 42 43 component_cokey
## 25 44 45 CMD_sp
## 26 46 47 PPT02
## 27 48 49 PAS02
## 28 50 51 component_totalsub_h
## 29 0 0 <NA>
## 30 52 53 norm_DD5_01
## 31 54 55 component_map_r
## 32 0 0 <NA>
## 33 56 57 norm_CMI04
## 34 0 0 <NA>
## 35 0 0 <NA>
## 36 0 0 <NA>
## 37 0 0 <NA>
## 38 0 0 <NA>
## 39 58 59 component_map_l
## 40 60 61 CMD_sm
## 41 62 63 Tmin08
## 42 64 65 Tmax11
## 43 66 67 norm_PPT03
## 44 68 69 muaggatt_iccdcd
## 45 0 0 <NA>
## 46 0 0 <NA>
## 47 0 0 <NA>
## 48 70 71 muaggatt_hydclprs
## 49 0 0 <NA>
## 50 72 73 component_slope_r
## 51 74 75 component_taxceactcl
## 52 76 77 muaggatt_slopegraddcp
## 53 0 0 <NA>
## 54 78 79 norm_RH05
## 55 80 81 norm_Tave11
## 56 0 0 <NA>
## 57 82 83 bFFP
## 58 84 85 component_elev_l
## 59 0 0 <NA>
## 60 0 0 <NA>
## 61 0 0 <NA>
## 62 86 87 norm_Tave04
## 63 88 89 component_taxorder
## 64 90 91 DD5_08
## 65 92 93 norm_CMI04
## 66 0 0 <NA>
## 67 94 95 muaggatt_slopegradwta
## 68 96 97 component_cokey
## 69 98 99 norm_Tmin_sm
## 70 100 101 norm_CMI08
## 71 0 0 <NA>
## 72 102 103 norm_RH06
## 73 104 105 norm_MAP
## 74 106 107 norm_PPT06
## 75 0 0 <NA>
## 76 0 0 <NA>
## 77 108 109 norm_NFFD11
## 78 0 0 <NA>
## 79 0 0 <NA>
## 80 110 111 norm_PAS02
## 81 112 113 component_irrcapcl
## 82 0 0 <NA>
## 83 114 115 norm_PAS
## 84 116 117 norm_NFFD
## 85 118 119 norm_MAP
## 86 120 121 field.optional...site.area.disturbance.level
## 87 0 0 <NA>
## 88 122 123 norm_DD5_wt
## 89 0 0 <NA>
## 90 124 125 norm_CMI_wt
## 91 0 0 <NA>
## 92 126 127 PPT01
## 93 0 0 <NA>
## 94 0 0 <NA>
## 95 0 0 <NA>
## 96 0 0 <NA>
## 97 128 129 Aspect.x
## 98 0 0 <NA>
## 99 0 0 <NA>
## 100 0 0 <NA>
## 101 130 131 norm_PPT_at
## 102 0 0 <NA>
## 103 132 133 norm_MAP
## 104 0 0 <NA>
## 105 0 0 <NA>
## 106 0 0 <NA>
## 107 0 0 <NA>
## 108 0 0 <NA>
## 109 0 0 <NA>
## 110 0 0 <NA>
## 111 134 135 CMI04
## 112 136 137 muaggatt_wtdepannmin
## 113 138 139 norm_CMD07
## 114 0 0 <NA>
## 115 0 0 <NA>
## 116 140 141 muaggatt_engdwbdcd
## 117 0 0 <NA>
## 118 142 143 norm_CMI03
## 119 0 0 <NA>
## 120 0 0 <NA>
## 121 0 0 <NA>
## 122 0 0 <NA>
## 123 144 145 DD5_sp
## 124 0 0 <NA>
## 125 0 0 <NA>
## 126 0 0 <NA>
## 127 0 0 <NA>
## 128 0 0 <NA>
## 129 0 0 <NA>
## 130 146 147 Tmin_wt
## 131 148 149 muaggatt_niccdcdpct
## 132 0 0 <NA>
## 133 150 151 norm_Eref05
## 134 152 153 CMD_at
## 135 0 0 <NA>
## 136 0 0 <NA>
## 137 0 0 <NA>
## 138 0 0 <NA>
## 139 0 0 <NA>
## 140 154 155 MAP
## 141 0 0 <NA>
## 142 0 0 <NA>
## 143 156 157 DD18_sm
## 144 0 0 <NA>
## 145 0 0 <NA>
## 146 0 0 <NA>
## 147 0 0 <NA>
## 148 158 159 norm_Tmax_at
## 149 0 0 <NA>
## 150 0 0 <NA>
## 151 0 0 <NA>
## 152 0 0 <NA>
## 153 0 0 <NA>
## 154 0 0 <NA>
## 155 160 161 NFFD04
## 156 162 163 norm_Tmin04
## 157 164 165 component_slopelenusle_h
## 158 0 0 <NA>
## 159 0 0 <NA>
## 160 0 0 <NA>
## 161 0 0 <NA>
## 162 0 0 <NA>
## 163 0 0 <NA>
## 164 0 0 <NA>
## 165 166 167 norm_DD_0_11
## 166 0 0 <NA>
## 167 0 0 <NA>
## split point status prediction
## 1 1.757813e+14 1 <NA>
## 2 2.550000e+00 1 <NA>
## 3 5.012247e-03 1 <NA>
## 4 1.535000e+01 1 <NA>
## 5 1.337000e+01 1 <NA>
## 6 1.000000e+00 1 <NA>
## 7 0.000000e+00 -1 Unhealthy
## 8 3.268113e+00 1 <NA>
## 9 1.030000e+02 1 <NA>
## 10 -1.095000e+01 1 <NA>
## 11 1.025000e+02 1 <NA>
## 12 0.000000e+00 -1 Unhealthy
## 13 4.650000e+01 1 <NA>
## 14 1.698866e+02 1 <NA>
## 15 3.200000e+01 1 <NA>
## 16 -1.175000e+01 1 <NA>
## 17 2.406433e+00 1 <NA>
## 18 2.550000e+01 1 <NA>
## 19 0.000000e+00 -1 Unhealthy
## 20 1.245000e+01 1 <NA>
## 21 1.000000e+00 1 <NA>
## 22 6.650000e+01 1 <NA>
## 23 1.895000e+02 1 <NA>
## 24 2.083329e+07 1 <NA>
## 25 5.200000e+01 1 <NA>
## 26 1.160000e+02 1 <NA>
## 27 6.800000e+01 1 <NA>
## 28 4.553932e+01 1 <NA>
## 29 0.000000e+00 -1 Healthy
## 30 5.050000e+01 1 <NA>
## 31 1.206500e+03 1 <NA>
## 32 0.000000e+00 -1 Unhealthy
## 33 2.675000e+00 1 <NA>
## 34 0.000000e+00 -1 Healthy
## 35 0.000000e+00 -1 Unhealthy
## 36 0.000000e+00 -1 Healthy
## 37 0.000000e+00 -1 Unhealthy
## 38 0.000000e+00 -1 Unhealthy
## 39 2.476500e+03 1 <NA>
## 40 3.425000e+02 1 <NA>
## 41 1.355000e+01 1 <NA>
## 42 1.025000e+01 1 <NA>
## 43 5.750000e+01 1 <NA>
## 44 3.144162e+00 1 <NA>
## 45 0.000000e+00 -1 Unhealthy
## 46 0.000000e+00 -1 Healthy
## 47 0.000000e+00 -1 Unhealthy
## 48 3.000000e+00 1 <NA>
## 49 0.000000e+00 -1 Healthy
## 50 1.150000e+01 1 <NA>
## 51 1.000000e+00 1 <NA>
## 52 8.000000e+00 1 <NA>
## 53 0.000000e+00 -1 Healthy
## 54 5.900000e+01 1 <NA>
## 55 4.650000e+00 1 <NA>
## 56 0.000000e+00 -1 Unhealthy
## 57 6.650000e+01 1 <NA>
## 58 5.500000e+00 1 <NA>
## 59 0.000000e+00 -1 Unhealthy
## 60 0.000000e+00 -1 Unhealthy
## 61 0.000000e+00 -1 Healthy
## 62 1.055000e+01 1 <NA>
## 63 2.000000e+00 1 <NA>
## 64 4.315000e+02 1 <NA>
## 65 3.965000e+00 1 <NA>
## 66 0.000000e+00 -1 Healthy
## 67 6.025000e+01 1 <NA>
## 68 2.079720e+07 1 <NA>
## 69 1.115000e+01 1 <NA>
## 70 -8.060000e+00 1 <NA>
## 71 0.000000e+00 -1 Unhealthy
## 72 6.350000e+01 1 <NA>
## 73 1.342000e+03 1 <NA>
## 74 7.100000e+01 1 <NA>
## 75 0.000000e+00 -1 Unhealthy
## 76 0.000000e+00 -1 Unhealthy
## 77 1.950000e+01 1 <NA>
## 78 0.000000e+00 -1 Healthy
## 79 0.000000e+00 -1 Unhealthy
## 80 2.800000e+01 1 <NA>
## 81 2.591183e+00 1 <NA>
## 82 0.000000e+00 -1 Healthy
## 83 1.750000e+01 1 <NA>
## 84 3.225000e+02 1 <NA>
## 85 1.163500e+03 1 <NA>
## 86 2.000000e+00 1 <NA>
## 87 0.000000e+00 -1 Unhealthy
## 88 1.695000e+02 1 <NA>
## 89 0.000000e+00 -1 Healthy
## 90 4.787500e+01 1 <NA>
## 91 0.000000e+00 -1 Healthy
## 92 1.405000e+02 1 <NA>
## 93 0.000000e+00 -1 Unhealthy
## 94 0.000000e+00 -1 Unhealthy
## 95 0.000000e+00 -1 Healthy
## 96 0.000000e+00 -1 Unhealthy
## 97 8.480367e+01 1 <NA>
## 98 0.000000e+00 -1 Healthy
## 99 0.000000e+00 -1 Unhealthy
## 100 0.000000e+00 -1 Healthy
## 101 3.220000e+02 1 <NA>
## 102 0.000000e+00 -1 Unhealthy
## 103 1.134500e+03 1 <NA>
## 104 0.000000e+00 -1 Healthy
## 105 0.000000e+00 -1 Unhealthy
## 106 0.000000e+00 -1 Healthy
## 107 0.000000e+00 -1 Unhealthy
## 108 0.000000e+00 -1 Unhealthy
## 109 0.000000e+00 -1 Healthy
## 110 0.000000e+00 -1 Unhealthy
## 111 1.216500e+01 1 <NA>
## 112 7.233230e+01 1 <NA>
## 113 1.410000e+02 1 <NA>
## 114 0.000000e+00 -1 Healthy
## 115 0.000000e+00 -1 Healthy
## 116 1.200000e+01 1 <NA>
## 117 0.000000e+00 -1 Healthy
## 118 8.405000e+00 1 <NA>
## 119 0.000000e+00 -1 Healthy
## 120 0.000000e+00 -1 Healthy
## 121 0.000000e+00 -1 Unhealthy
## 122 0.000000e+00 -1 Healthy
## 123 6.070000e+02 1 <NA>
## 124 0.000000e+00 -1 Unhealthy
## 125 0.000000e+00 -1 Healthy
## 126 0.000000e+00 -1 Unhealthy
## 127 0.000000e+00 -1 Healthy
## 128 0.000000e+00 -1 Unhealthy
## 129 0.000000e+00 -1 Healthy
## 130 1.900000e+00 1 <NA>
## 131 8.250000e+01 1 <NA>
## 132 0.000000e+00 -1 Unhealthy
## 133 1.040000e+02 1 <NA>
## 134 2.200000e+01 1 <NA>
## 135 0.000000e+00 -1 Healthy
## 136 0.000000e+00 -1 Healthy
## 137 0.000000e+00 -1 Unhealthy
## 138 0.000000e+00 -1 Healthy
## 139 0.000000e+00 -1 Unhealthy
## 140 1.085500e+03 1 <NA>
## 141 0.000000e+00 -1 Healthy
## 142 0.000000e+00 -1 Healthy
## 143 1.925000e+02 1 <NA>
## 144 0.000000e+00 -1 Unhealthy
## 145 0.000000e+00 -1 Healthy
## 146 0.000000e+00 -1 Healthy
## 147 0.000000e+00 -1 Unhealthy
## 148 1.530000e+01 1 <NA>
## 149 0.000000e+00 -1 Unhealthy
## 150 0.000000e+00 -1 Unhealthy
## 151 0.000000e+00 -1 Healthy
## 152 0.000000e+00 -1 Unhealthy
## 153 0.000000e+00 -1 Healthy
## 154 0.000000e+00 -1 Unhealthy
## 155 2.650000e+01 1 <NA>
## 156 5.150000e+00 1 <NA>
## 157 9.050829e+01 1 <NA>
## 158 0.000000e+00 -1 Healthy
## 159 0.000000e+00 -1 Unhealthy
## 160 0.000000e+00 -1 Unhealthy
## 161 0.000000e+00 -1 Healthy
## 162 0.000000e+00 -1 Unhealthy
## 163 0.000000e+00 -1 Healthy
## 164 0.000000e+00 -1 Unhealthy
## 165 5.500000e+00 1 <NA>
## 166 0.000000e+00 -1 Healthy
## 167 0.000000e+00 -1 Unhealthy
Fit a single recursive partitioning or classification tree. Followed instructions from this youtube video.
Below is an example of one of the trees included in the random forest.
Error in randomForest.default(m, y, …) : Need at least two classes to do classification.
I may be misunderstanding this error, but I think it is referring to the response variable?
The documentation here describes the error prompt when: if (classRF && !addclass && length(unique(y)) < 2) stop(“Need at least two classes to do classification.”)
It is possible some of the NA or -9999 values are causing issues.
We can try imputing the data, however this requires us to remove columns with more than 53 factors, which probably makes sense anyway.
Removing factors with more than 53 levels didn’t resolve the error from the randomForest command, but it did allow us to use the rfImpute command to impute our data.
Wow it actually worked if the data is imputed.